perm filename P[AP,DBL]4 blob
sn#150423 filedate 1975-03-18 generic text, type T, neo UTF8
COMMENT ⊗ VALID 00019 PAGES
RECORD PAGE DESCRIPTION
00001 00001
00003 00002 .DEVICE XGP
00005 00003 5↓_ABSTRACT_↓*
00007 00004 5↓_1. Motivation_↓*
00010 00005 5↓_2. Task_↓*
00014 00006 5↓_3. Target program_↓*
00019 00007 5↓_4. Annotated protocol_↓*
00023 00008 5↓_5. The BEINGs scheme_↓*
00045 00009 5↓_6. Control in the system_↓*
00052 00010 5↓_7. Theoretical aspects of PUP6_↓*
00060 00011 5↓_8. Ideal and real systems_↓*
00070 00012 5↓_9. Questions for AP systems_↓*
00078 00013 5↓_10. Examples from the dialogue which results in CF_↓*
00091 00014 5↓_11. Excerpt from the synthesized program itself running_↓*
00096 00015 5↓_12. Other Tasks_↓*
00102 00016 5↓_13. Numerical efficiency data_↓*
00106 00017 5↓_14. Conclusions_↓*
00119 00018 5↓_References_↓*
00123 00019 5↓_Appendix: the BEINGs_↓*
00127 ENDMK
⊗;
.DEVICE XGP
.!XGPCOMMANDS←"/TMAR=50/PMAR=2200/BMAR=100"
.FONT 1 "BASL30"
.FONT 2 "BASB30"
.FONT 3 "NGR25"
.FONT 4 "BASI30"
.FONT 5 "NGR40"
.FONT 6 "NGR20"
.FONT 7 "FIX20"
.TURN ON "↓_π{"
.TURN ON "⊗" FOR "%"
.PAGE FRAME 43 HIGH 91 WIDE
.AREA TEXT LINES 1 TO 38
.TITLE AREA FOOTING LINE 43
.!XGPLFTMAR←100
.SPACING 100 MILLS
.PREFACE 230 MILLS
.NOFILL
.PREFACE 45 MILLS
.FILL
.NEXT PAGE
.MACRO B ⊂ BEGIN VERBATIM GROUP ⊃
.MACRO BB ⊂ BEGIN NOFILL SKIP 2 SELECT 3 INDENT 0 GROUP ⊃
.MACRO E ⊂ APART END ⊃
.TABBREAK
.PAGE←0
.EVERY FOOTING(⊗7{DATE},Lenat,page {PAGE}⊗*)
.NEXT PAGE
.BEGIN CENTER
⊗5SYNTHESIS OF LARGE PROGRAMS FROM SPECIFIC DIALOGUES⊗*
Douglas B. Lenat
Stanford Artificial Intelligence Laboratory
Stanford, California
⊗5↓_ABSTRACT_↓⊗*
.END
Automatic programming must eventually deal with large programs.
Using the paradigm of dialogue, a preliminary study was done on
generating programs tens of pages long, requiring
several hours of user-system interaction time to synthesize. Many
assumptions which are reasonable when concentrating upon tiny
targets become limiting factors in large dialogues. An
experimental system, PUP6, was constructed.
The methods it employs to generate target code are not
formal, but rather involve structuring of knowledge about
programming, about the chosen task domain (inductive inference
LISP programs), and about transfer of control. Specification is
via long, brittle dialogues with the user, but is only partial:
ambiguities are resolved by inference and deferral.
Knowledge is represented as a pool of structured modules, whose
abilities include asking and answering questions posed by each
other and by the user. Inadvertantly, a new style of target
program was synthesized: like PUP6, it can be interrupted as it runs
and queried about what it's doing.
This research revealed some major difficulties
the "next generation" of nonformal automatic
program synthesis systems must face.
.B
.E
.ONCE CENTER
⊗5↓_1. Motivation_↓⊗*
Many AI researchers (e.g., Sussman, Goldstein) have experimented with the
generation of small programs, from partial specificiations in an ambiguous
language. They recognize
that larger capabilities are prerequisite to utility, but (correctly)
argue that mastery of toy tasks must precede mastery of real ones.
Efforts on synthesis of ⊗4large⊗* programs concentrate on
efficient but routine transformations, compilations
of well-specified programs in higher languages
(e.g., by heuristic compilers).
Despite the small chance for success in aiming at ambitious targets, it
it is dangerous to develop techniques applicable only to small tasks.
The analogy between understanding the
construction of a 10-line program, in five minutes of dialogue, and
understanding the
construction of a 100-page program, in five hours of dialogue, is at
best superficial.
A system, PUP6, was partially implemented to study large
code generation tasks. It did
ultimatedly synthesize three LISP programs: a structural concept formation
program, a grammatical inference program, and a simple property
list maintenance routine (referred to as CF, GI, and PL). Some unanticipated
dificulties were encountered which may be inherent in syntheses of this
magnitude.
.GROUP SKIP 3
.ONCE CENTER
⊗5↓_2. Task_↓⊗*
The task of synthesizing
large INTERLISP[9] programs, from long specific
dialogues with a user, in a
restricted subset of English, was considered. This activity will be referred to
as ⊗4automatic programming⊗*.
The first question to settle in such an endeavor is what the
⊗4target programs⊗* (the generated code)
are going to be. More generally, what is the domain of the target
programs? A large amount of effort was spent on this question, and the
chosen domain was inductive inference programs. The
obvious problem here is the size and sophistication of the targets, but there
are four big attractions:
(i)
A wide range of complexity exists, from a one-page sequence
extrapolator to Meta-Dendral.
(ii) Each increasingly
sophisticated inductive inference program uses
many of the concepts embodied in
simpler inductive inference programs.
(iii) If a super-human target program is
ever written, it could itself contribute to the field of Automatic
Programming! (This remark is humorous in the seventies, but may be
commonplace someday.)
(iv) Since people (especially scientific researchers)
are the inductive
inference experts, our reliance on introspection is as
valid -- and potentially as valuable -- as chemists' protocols were
to Dendral.
After studying many sample programs from the inductive inference domain,
sequence extrapolation seemed the most reasonable beginning
task. It was quickly learned that this was too easy: humans have only
a few techniques for extrapolating sequences, and a very limited
capacity for composing them. Thus a rather rigid sequence
extrapolator writer may be capable of generating a large class of
super-human extrapolation programs (see section 4.2 on
Sequence-Extrapolator-Writer, in [3]).
The next candidates were grammatical inference and concept
formation [4].
Determined not to choose too simple a task again, the
latter program was finally decided upon. The particular target was similar to
[10], except Winston's heuristic
graph-matching algorithm was simplified.
.B
.E
.ONCE CENTER
⊗5↓_3. Target program_↓⊗*
It seems instructive now to describe in detail
how CF, the target program,
should operate. It repeatedly scans a scene
and tries to name it. The
scene is broken into a set of features and a set of objects. Each
feature is a relation
on one or more objects in the scene. Internally, CF
maintains a model for each differently-named
scene it has ever encountered.
The
model contains a description of the objects expected in the
scene, a set of features which must be present in any scene having this
name, a set of features which must not
be present in the scene, and a set of features which may or may not
be present. Thus a model is an archtypical scene plus a name.
Each time it is confronted by a new scene, CF must
scan its models until it finds one which matches it. A model is said
to match a scene if all the MUST features associated with that model
are observed in the scene, and all the MUSTNOT features are
absent from the scene. CF
informs the user of this guess,
and accepts the proper name. If it guessed incorrectly, it
modifies its models. The wrong-guess model may have features added to
its MUST or MUSTNOT sets.
This prevents CF from making the same wrong guess twice in succession.
The correct-name model may have to
be modified or (if it's a new name) created and inserted into the
list of models.
.B
For example, part of a scene might be:
OBJECTS a,b,c,d
RELATIONS (GREEN a) (BLUE c) (TOUCHES c d) (SUPPORTS a c) (SUPPORTS b c)
CF's current model for an arch might be:
NAME ARCH
OBJECTS a,b,c
MUST (SUPPORTS a c) (SUPPORTS b c)
MUSTNOT (TOUCHES a b)
MAY (GREEN a) (WEDGE c) (PRISM a) (BLOCK b) (PARALLEL a b)
.E
Suppose that the target program reads in the above
scene fragment and
tries to match it to the above model for consistency. The MUST
relations should all be present. Yes, the scene does contain
(SUPPORTS a c) and (SUPPORTS b c). Next, the MUSTNOT relations must
be absent from the scene. Sure enough, (TOUCHES a b) isn't there. So
the model and scene are consistent, and the program announces that its
guess is ARCH.
If the user verifies this guess,
then the MAY set of the ARCH model
is augmented with the relations (BLUE c) and (TOUCHES c d), and
the OBJECTS set is augmented with "d."
If the user denies that the scene is an arch, CF
sees if there are any relations in the ARCH model's MAY set which do not
occur in the scene. If so, one of them (e.g., (PARALLEL a b)) will
be transferred from the MAY to the MUST set. If no such feature
existed, the program would look for a feature present in the scene
but not mentioned in any set of the ARCH model (e.g., (TOUCHES c d)), and insert
it into the MUSTNOT set. In either case, the user would
be asked what the true name was, and that
model would have its MAY set augmented by any new
features in the scene. Any features on the true-name model's
MUST or MUSTNOT sets
which contradicted the scene would be transferred to the MAY set.
.B
.E
.ONCE CENTER
⊗5↓_4. Annotated protocol_↓⊗*
After the target concept formation program was specified, it
was trimmed and then rewritten as a structured program [2]. Next,
a complete dialogue was simulated between the user and a human
programmer (referred to as the system-player) playing the role of an
"intelligent" automatic programming system (similar to, e.g.,
[1]). The system-player kept careful records as he
programmed, and tried to create a bug-free structured program. The
dialogue was then annotated: after each user response, comments
were inserted which described the "states" the system-player had gone
through before printing his next response. This included blind paths
which were tried, use of outside world knowledge, and, in general,
was meant to be the "reasoning" necessary to do the task. The
fear was that a system could be built which synthesized CF,
yet was so unintelligent that nothing was
learned from it (e.g., see section 4.1 on PW1 in [3]).
Henceforth, ⊗4protocol⊗* will
refer to this user-player / system-player simulated dialogue.
The central goals in writing PUP6 were that it
(i) correctly generate the target program CF,
(ii) follow the
protocol, and, most importantly, (iii) go
through the same line of reasoning that the comments indicated the
system-player
followed.
PU6 would be
far along the road toward utility if it also (iv) could
write CF from several dialogues, from several users, with
little preparation. PUP6 was not designed to do this, and in the end
it proved to be a crippling deficit.
A corollary of this incremental annotated protocol approach is that the
abilities of the user must coincide with those of the subject who
participated in the protocol. To be successful,
the user must be familiar
with LISP, well-grounded in computer science, and have the
input/output behavior of CF, the concept formation target program, very
clearly in his mind. The natural language front end is so brittle
that the user must also phrase his responses very similarly to those
of the original protocol user-player.
.B
.E
.ONCE CENTER
⊗5↓_5. The BEINGs scheme_↓⊗*
The next major design issue is the internal mechanism for
achieving this task. The earlier PUP programs [3] had taught the
author to organize the system knowledge in ways meaningful to how it
would be needed. This need for structuring is also echoed in [7].
Many people, however, champion the merits of uniformity: in particular,
conceptually easy addition of new knowledge, and more
standardized (hence simpler) communication and interaction between
the knowledge. Not wishing to give up these advantages, a compromise
mechanism, BEINGs, was developed.
A BEING is a loosely-knit little expert, a collection of about
thirty different questions and their answers. Each such part of a BEING, called
an ⊗4organ⊗*, is
therefore an explanation of some facet of the BEING.
In other words, a specialist is
considered ⊗4equivalent⊗* to its answers to a couple dozen queries. In
answering a question, all a BEING may do is to create a new BEING, fill in
some BEING's answer to some question, and ask questions (of itself, of other
BEINGs, of the human user interacting with the system).
So far, this representation is isomorphic to the ACTOR[5] formalism.
The structure is present, all right, but what about the uniformity? It
seems that one can't add a new BEING (write its organs) unless he knows
the particular questions he can ask each other BEING in the system (the names
of their organs), and
that one BEING can't communicate with another unless it specifically
knows what that BEING can be asked.
Both problems theoretically disappear if
the set of questions is ⊗4fixed⊗* once and for all. That is,
let every BEING consist of answers to a single set of questions (have
the same set of organs), and
let the user and every BEING have access to (and implicit understanding of)
the names of these questions.
Of course the number of questions (organs) is arbitrary, but it
should not be ignored completely. By choosing it to be ~1,
a completely uniform representation is obtained.
By allowing it to be ~1000, one could simply take the
union of all the questions any BEING wanted, and thus all the uniformity
would disappear. The number of organs each BEING has is therefore a parameter
expressing the degreee of compromise between structure and standardization
in the community of BEINGs.
The specific BEING organs chosen have much to do with the
epistemology of programming. The tentative set of organs implemented in
each BEING in PUP6 are listed below, followed by a brief
description. Also listed is the percentage of the original PUP6 pool
which actually wanted the organ specified.
.XGENLINES←XGENLINES-3
.BEGIN NOFILL
.FLUSH LEFT
⊗2IDEN⊗* 54% How is this BEING referenced in English phrases? Implemented as productions.
⊗2ARGS⊗* 63% How many arguments? Which are optional? Are there any local variables?
⊗2ARG-CHECK⊗* 81% Predicates which examine each argument for suitability.
⊗2EVAL-ARGS⊗* 4% Which arguments are to be evaluated, and which quoted?
⊗2WHAT⊗* 82% A brief summary of the global purpose. Usually a template for an English sentence.
⊗2WHY⊗* 77% A justification of the BEING's existence. The caller explains here why it was called.
⊗2HOW⊗* 72% A summary of the methods the BEING intends to employ. Global strategies.
⊗2EFFECTS⊗* 27% What will probably be true after this BEING is through? What can it achieve?
⊗2WHEN⊗* 19% Why should this BEING be given control now? Computed as the sum of weighted factors.
⊗2META-CODE⊗* 70% The body of the executable code, but with uninstantiated subparts.
⊗2COMMENTS⊗* 16% Hints for filling in undefined sections of other BEING organs.
⊗2REQUISITES⊗* 10% If this BEING is actually chosen, what must be made true just before (pre-),
during (co-), and after (post-) execution? Work to make these true (unlike ARG-CHECK).
⊗2DEMONS⊗* 7% Which demons should be kept active while the BEING is in control?
⊗2AFFECTS⊗* 14% Which other BEINGs might be called by this BEING, and why?
⊗2COMPLEXITY⊗* 92% A vector of utility measures, including ease of calling, chance of success, chance
of (calling)* itself, expected time cost, efficiency of its output results.
⊗2SPECIALIZATIONS⊗* 40% How to write a more streamlined, special-case version of this BEING.
⊗2GENERALIZATIONS⊗* 27% What are some other BEINGs, encompassing this one?
⊗2ALTERNATIVES⊗* 16% What are some equivalent BEINGs? (to try if this one fails)
⊗2RESULTS⊗* 15% How many values does this return? What domain is each in? Any side effects?
⊗2STRUCTURE⊗* 4% If this is ever viewed as a data structure, what operations are done to it, and how?
⊗2ENCODABLE⊗* 9% How to order the evaluation of the other organs, when writing a new version.
⊗2FORM-CHANGING⊗* 1% Should the entire structure of an existing BEING be radically altered?
⊗2INHIBIT-DEMONS⊗* 5% A lock/unlock mechanism, useful when handling demonic interrupts.
.END
.XGENLINES←XGENLINES-3
The operational details of each organ will be explained by dissecting
OBTAIN-USABLE-INFORMATION,
a high-level, domain-independent BEING. For each organ is listed its
⊗2NAME⊗*, ⊗3a typical
question which it would answer⊗*, ⊗2its answer⊗*,
⊗3the LISP program stored as the value
of that organ in OBTAIN-USABLE-INFORMATION⊗* (the way it computed the answer), and
a brief discussion.
.BB
⊗2IDEN⊗* Can you recognize this: "Find out more about frobs"? ⊗2Yes. Call me with argument "frobs".⊗*
((if you see: (OBTAIN-USABLE-INFORMATION any1)
then return: (OBTAIN-USABLE-INFORMATION (TRANSLATE any1)))
(if you see:(FIND OUT MORE ABOUT any1)
then return: (OBTAIN-USABLE-INFORMATION any1)))
.E
Each BEING is given the duty to recognize and process English phrases
related to him. In practice, only a few simple phrase templates were
needed for each BEING; OBTAIN-USABLE-INFORMATION only used two patterns.
All the IDEN organs are collected together into one table. When a form
⊗4F⊗* must be translated, the TRANSLATE BEING takes control and (by
searching this table) asks "who can recognize ⊗4F⊗*?" Each IDEN organ
runs a little program (typically a pattern-match), then replies either
NO, or else provides a little program whose value should be the translation
of ⊗4F⊗*. If there is more than one responder, a global choice BEING,
CHOOSE-FROM, selects the winner. Often, as in the first case for this
BEING, the program which is
returned calls TRANSLATE recursively on some organs
of ⊗4F⊗*.
.BB
⊗2EXPLICIT-ARGS⊗* What arguments do you take? ⊗2The single argument U⊗*. (U)
.E
The argument to this BEING is never operated upon, hence there is no need to
understand anything about it. It is simply passed to BEINGs called by
OBTAIN-USABLE-INFORMATION.
.BB
⊗2WHAT⊗* Basically, what do you do? ⊗2I obtain some information which can be used.⊗*
⊗2HOW⊗* How do you do this? ⊗2Either obtain new facts about old information, or
else obtain totally new information.⊗*
⊗2WHY⊗* Why do you do this? ⊗2PUP6 has no more information that is immediately usable⊗*.
.E
These three organs are primarily for the user's benerfit. In the current
case, they don't even have any uninstantiated organs. If the user asks one
of these questions when this BEING is in control, then the appropriate
answer can simply be typed out.
.BB
⊗2WHEN⊗* How badly do you want control now, and why? ⊗2Overall rating is high because
there are 5 pieces of new information.⊗*
((if T then add in -10 because (THIS IS AN EXPONENTIALLY-GROWING, BAD THING TO DO IN GENERAL))
(if NEW-INFO-LIST then add in (PLUS 100 (LENGTH NEW-INFO-LIST))
because (WE SHOULD WORK ON UNASSIMILATED NEW INFORMATION IF THERE IS ANY)))
.E
The WHEN organ of a BEING is a collection of triples: if <predicate> then
<value> because <reason>. If the <predicate> evaluates to non-null, then
the <value> program is executed. It returns a number, which is then added
together with the other factors' numbers to produce a rough estimate of
how a propos this BEING is to take control right now. The <reason>
evaluates to an English phrase, for the benefit of inquisitive users.
This linear scheme is undesirable but (sigh) adequate. The first factor
here says to always add in the number -10; the second says
if there is some new information sitting around unexploited, to add in 100
plus the number of such pieces.
These factors and their weights, like the contents of all the organs
of all the BEINGs initially in PUP6, were decided upon and inserted by
hand.
.BB
⊗2META-CODE⊗* What do you do specifically to achieve your results? ⊗2I have four specific BEINGs
decide among themselves who goes next.⊗*
(DO (CHOOSE-FROM (GET-NEW-INFORMATION U)
(TRANSLATE U)
(ANALYZE-IMPLICATIONS U)
(EXTRACT-RELEVANT-SUBSET U))
BECAUSE (WE CAN ONLY TRY TO OBTAIN USABLE INFORMATION IN ONE WAY AT A TIME))
.E
The META-CODE says to pass control to one of these four BEINGs:
Get-Brand-New-Information (in miniscule subset of
English, from the user), Translate
something (transform from restricted English to BEING-calls),
Analyze-The-Implications (of some existing, translated information),
Extract-a-Relevant-Subset (of the existing information) to
concentrate upon.
The goal mechanism, SATISFY,
is usually employed when the BEING doesn't know exactly
which other BEINGs to call. If the ⊗4set⊗* of possible choices is known, as
in this case, then CHOOSE-FROM may be called with the explicit choice set.
.BB
⊗2MAIN-EFFECTS⊗* Are you relevant to making the user aware of the time? ⊗2No.⊗*
((to get (NEW INFO any1) do (OBTAIN-USABLE-INFORMATION any1))
(to get (USABLE INFO any1) do (OBTAIN-USABLE-INFORMATION any1)))
.E
The EFFECTS organs of each BEING are collected into
one table to facilitate asking all BEINGs simultaneously "Can you
cause effect X to occur?" Each EFFECTS organ examines X and the world, and
either replies No, or else returns a little program (involving calls
and constants) which should produce the effect.
This program generally will involve a call to the BEING
itself. OBTAIN-USABLE-INFORMATION shows
that it should be called to acheive
the existence of new or usable information.
.BB
⊗2AFFECTS⊗* Will you call TRANSLATE? ⊗2Possibly.⊗*
((CHOOSE-FROM IS CALLED)
(some BEING who can cause (AWARE USER (ABOUT TO OBTAIN USABLE INFO)) is called)
(TRANSLATE POSSIBLY IS CALLED)
(GET-NEW-INFORMATION POSSIBLY IS CALLED)
(ANALYZE-IMPLICATIONS POSSIBLY IS CALLED)
(EXTRACT-RELEVANT-SUBSET POSSIBLY IS CALLED) )
⊗2COMPLEXITY-VECTOR⊗* How difficult are you to call? ⊗2Average.⊗* (.5 .5 .5 .1)
.E
The first component says that OBTAIN-USABLE-INFORMATION is of average
difficulty to call. Next, there exists a .5 chance that some descendant
will call it again. The third component indicates that the cost of trying
this BEING is typical. Finally, there is no good reason for inhibiting
it ever. In general, each component can be a ⊗4program⊗*,
not just a constant.
.BB
⊗2GENERALIZATIONS⊗* What BEINGs are more general than you are? ⊗2WRITE-PROGRAM
and SERVE-THE-USER⊗*. (WRITE-PROGRAM SERVE-THE-USER)
⊗2ALTERNATIVES⊗* In case you fail, who else might I try? ⊗2Try USE-INFORMATION,
FIX-INCORRECT-PIECE, OPTIMIZE, or FILL-IN-UNDEFINED-SECTION)⊗*
.E
If this BEING fails, try the alterntives; if they fail, consider trying the
more general BEINGs.
To illustrate a few of the other BEING organs, consider PARTITION-A-DOMAIN,
a high-level,
domain-specific BEING.
Its
basic idea is to divide up a space into categories. Only two
BEING organs are filled in here which were absent from
OBTAIN-USABLE-INFORMATION: SPECIALIZATIONS and DEMONS.
The
SPECIALIZATIONS organ says that to write a specific version of itself,
a few decisions must be made. The results will then indicate what
pieces of code get assembled together to form the new BEING. The
partition may be only partial (an element of the domain belongs to no
class of the partition), only weak (an element belongs to more than
one class), and, most importantly, the specialized partitioning
BEING should work
by repeatedly doing some of the following three activities: accept a
class-name from the user and guess some of its elements, accept an
element from the user and try to guess which class it belongs to,
or accept an <element, class-name> pair. Other BEINGs know
about these accepting, guessing, and checking activities.
The DEMONS organ says that during the partitioning, the only
new demon to keep active is the one called Fringe-of-Consciousness.
This deserves elaboration. In the course of
writing GI, the system learns that the main task is one of
grammatical inference. The GRAMMAR BEING then asserts
that if the user ever mentions the word ⊗4test⊗*, he may actually mean
⊗4parse⊗*. This is placed in the IDEN organ of the TEST BEING, and is
therefore only demonized, waiting on the periphery of PUP6's
concentration. It is in fact triggered later in the dialogue, as an
inference surprising to the user.
.B
.E
.ONCE CENTER
⊗5↓_6. Control in the system_↓⊗*
The BEINGs control themselves in a simple way. A very general
BEING, SERVE-THE-USER, has control initially. In general, some BEING
⊗4B⊗* will be in control. Its organs are assembled into an
executable function, and it is run. During the course of its reign,
⊗4B⊗* will want things done and/or tested which it cannot manage by
itself. This corresponds to when a normal program needs to call a
subroutine. What ⊗4B⊗* does is to provide SATISFY, the goal mechanism,
with a description of what is wanted. SATISFY asks the
entire BEING pool "who can do this thing?", and collects a list of
the responders. CHOOSE-FROM is then called, and asks each responder
why he
should be allowed to go now (the WHEN organ answers this) and how costly he will be
if he does go (the COMPLEXITY organ). The responders are then
passed control, in order, until one succeeds or all have failed.
In addition to these goal statements, there exists a
"world" consisting of assertions and variables with values. These are
regarded as trivial cases of BEINGs, possessing only one organ. So
⊗4B⊗* may also query this data base by saying "what assertions match
this one...", or by asking "what is the value of...". These actions
can be programmed in a much more efficient manner than as BEINGs.
They never employ nondeterministic transfers, hence are faster by
an amount proportional to the total number of BEINGs in the system.
The way a BEING's organs fit together
is uniform over all the BEINGs at all times. Thus one simple function
(an assembled BEING) assembles all the BEINGs initially into LISP
functions. The precise algorithm for doing this is now discussed.
First, the explicit arguments (those global to the
BEING) are bound. The implicit arguments (those local to the BEING,
like PROG variables) are initialized. The name of the BEING is pushed
onto the BEING control stack, and any
newly-activated demons are pushed onto the demon stack. The
ARGS-CHECK predicates are evaluated. Then PUP6 works to make each
PRE-REQUISITE true in the world. Each COMMENT is evaluated, then the
CO-REQUISITES, META-CODE, and current demons all are executed in
pseudo-parallel. Each POST-REQUISITE is then satisfied. These
activities are all embedded in an AND, so any of them can cause the
entire BEING to halt and fail, simply by returning NIL.
Just before exiting, the demon and BEING
stacks are popped,
and control passes to the delegated BEING.
A fancy context
mechanism was available but not actually needed.
The function which assembled all the BEINGs exploited the
fact that it operated only at system load time. Thus if the BEING
had no new DEMONs to activate, all the corresponding demon-stack
manipulations could be omitted.
Similarly, no CO-REQUISITES means no parallel processing need be
simulated. Optimizations like these are clear
from comparing the organs of OBTAIN-USABLE-INFORMATION, the algorithm
just described, and the actual functional version produced:
.BEGIN NOFILL SELECT 3 GROUP
(OBTAIN-USABLE-INFORMATION
(LAMBDA (U)
(AND
(PUSH OBTAIN-USABLE-INFORMATION BEING-STACK)
(PUT OBTAIN-USABLE-INFORMATION SPEC-WHY BECAUSE)
(EVERY CURRENT-DEMONS APPLY*)
(SETQQ BECAUSE (WE CAN ONLY TRY TO OBTAIN USABLE INFORMATION IN ONE WAY AT A TIME))
(CHOOSE-FROM
(GET-NEW-INFORMATION U)
(TRANSLATE U)
(ANALYZE-IMPLICATIONS U)
(EXTRACT-RELEVANT-SUBSET U)))
(POP BEING-STACK)))
.END
Each BEING has the responsibility to set the value of the variable
BECAUSE just prior to calling another BEING. When a BEING takes control, it
immediately stores the current value of BECAUSE as a special addition to its
WHY organ. Thus each BEING can provide two reasons: one supplied in general,
and one set by the particular BEING which calls it. Above,
OBTAIN-USABLE-INFORMATION sets a reason for calling CHOOSE-FROM.
Notice that many organs can answer queries, but play no role once the BEING gains
control.
.B
.E
.ONCE CENTER
⊗5↓_7. Theoretical aspects of PUP6_↓⊗*
For aesthetic reasons, an effort was made to keep the languages for
PUP6 and CF the same. This meant PUP6 must write CF as a family of BEINGs.
Consequently ⊗4some⊗* BEING(s)
must write new BEINGs. The significant design issue here is
that the BEING who knows about
⊗4X⊗* takes charge of generating new BEINGs relating to ⊗4X⊗*. For
example, the INSERT BEING can do inserting by one of a few
algorithms, and he can also write (by specializing himself) more
specific insert routines, and he can answer questions about
inserting.
This idea is analogous to any reliance upon experts. The
SPECIALIZATIONS and ENCODABLE organs handle this chore.
An unusual consequence is that the synthesized code will have all
the ⊗4types⊗* of abilities that any BEING community has: it
can modify itself according to the user's
complaints and answer questions about what it is doing as it runs.
The CF program generated cannot, of course, write the full complement of
programs PUP6 could write.
In a similar vein, recall that ⊗4each⊗* BEING must do the translation
of the users' relevant quasi-English inputs into BEING-usable form.
For example, our INSERT BEING must also recognize and
process phrases involving inserting, stack-pushing, and merging.
PUP6 does not operate within the
exploratory anthropomorphic paradigm of programming
(ignore details, catch them
later by debugging). As in all programming,
decisions continually crop up which
the system is not able to resolve at the time. PUP6 always tries to
defer the decision as
long as possible. When, at last, no progress can be made without its
resolution, and if the system is still unsure, then either the user
settles the question or else a backtrack point is reluctantly set up.
But often, by this time, some new information is present which
enables the system to resolve the decision, thus reducing the amount
of backtracking. If there are two or more decisions which can no
longer be deferred, the system tackles the easiest one first.
It was hoped that most of the carelessness
bugs could be eliminated through this deferral, feed-forward, and very
precise record-keeping. Humans depend on their adaptability to
compensate for limitations in their brain hardware
[8] but there is no
need for an ⊗4automatic⊗* programming system to do so. For example,
when a list structure is first encountered, the system records
warnings that it is undefined, no accesses, insertions, or deletions
have been observed, and so on. Each such worry is weighted, and
periodically the system resolves such warning notes -- especially
near the completion of the target program.
The new BEINGs created
during the execution of a specified task are kept distinct from the
original set of BEINGs. A ⊗4program⊗* for task ⊗4T⊗* is
accomplished by doing ⊗4T⊗* and then dumping the new BEINGs out onto
a new file. The entire old BEINGs pool is then treated as the
language supporting this file. As a corollary, asking a BEINGs-based
system to write any subset of itself is the null task.
Most programmers intentionally augment their code to aid in
later debugging, modification, and readability by humans. These aids
are typically comments, summaries, over-generalized subroutines,
optional print statements, and runtime statistics. Recently, the
structure of the program itself has also been
recognized as a powerful
manipulable element, affecting the accessability of the code, not just
its length or running time.
The length of any program written as a pool of
BEINGs is several times as long as a clean hand-coded version. This
extra code, the organs of each new BEING which are superfluous, may be
viewed as organized self-documentation. They should in theory improve the
debugging, expansion, and accessibility (to alien users) of the
synthesized code.
Some assertions are so meaningful to the user,
that they are stored in the code itself, even if they are
only temporary. At any time, the user
may look at a piece of code; the comments present ⊗4then⊗* are all
assertions pertaining to that piece of code at that time.
BEINGS may use comments at several different levels. At the
lowest level, each comment is merely a unique token; it may be
inserted, removed, or searched for. Slightly better, the BEINGs
system can ask "is there a comment involving ...". For this purpose, a
constrained syntax for the comments should be developed. Otherwise
(as, unfortunately in PUP6) each new comment must be attended to
ad hoc.
.B
.E
.ONCE CENTER
⊗5↓_8. Ideal and real systems_↓⊗*
When imagining an ideal AP (automatic programming) system,
one ability we might wish for is that it be able to input a
well-documented program, glean pure programming knowledge from it,
and transform this into a format it can use. Also, to improve
itself, it should be capable of digesting a sample, valid AP dialog,
and (perhaps through additional user interaction) modify itself so it
could actually carry on that dialog.
Another ideal to hope for is that the user be permitted to do whatever
he can whenever he can; if he suddenly thinks of a stray caution, the
AP system should accept it (e.g., "Oh, I might want to type in
HALT instead of STOP, sometimes").
BEINGs were not designed with
these goals in mind, and as a result they may be an
insufficient framework for achieving them.
By studying the difficulties of PUP6,
isolating those due to poor programming from those due to
poor ideas, enough may be learned to build the next system one
qualitative step closer to this ideal.
It is in this spirit that BEINGs
are now contrasted against the concepts of ACTORs, CLASSes,
FRAMES, HACKER, formal AP systems, and macro expansion schemes.
ACTORS [5] provided the key concept of integrating
uniformity of construction with sophistication of behavior.
Each ACTOR module has its own, unique organs, hence must be (implicitly)
aware of all the organs (message formats)
of all the ACTORs it ever is going to shake hands with.
Adding a new module is thus conceptually intricate as
well as practically difficult. CLASSes have a few standard
organs, and the modules are arranged in groups, each of which has its
own additional types of organs. Thus a module can ask ⊗4any⊗* other module
one of a few universal questions, and it can ask any module in its group
an additional set of questions. If it is permitted to know about other
groups' special organs, then the same adding problem recurs. If it is
denied such knowledge, it cannot access much of the knowledge in the
pool. Yet if one requires a completely universal set of message types, then
most of them will be irrelevant to most modules. This is the price which
BEINGs pay. This superfluity is real and almost
intolerable.
FRAMES [6] seems superficially similar to BEINGs, and
are so amorphous that they actually could subsume BEINGs. There is a
deep difference of philosophy and of intended usage, however.
FRAMES intentionally have default values already (with no processing)
filled in for each frame, and replaced only when necessary.
This is akin to automatic programming by blind debugging, but is
antithetical to the fastidious bookkeeping BEINGs philosophy. As an
example, consider the writing of a short, recursive
LISP program (reverse, flatten, factorial, etc.)
A human, consulting the proper frame in his head, knows to ignore the
base step for a while, then fill it in, usually as ⊗4if NIL, then
NIL⊗*. The
human makes this default synthesis, tries out the program, and looks
closer (to fill in this "slot" properly) only if something calls his
attention to it (such as a LISP error message:
NON-NUMERIC ARG ⊗4NIL⊗*, e.g., if what was really needed was the base
step ⊗4if NIL, then 0⊗*).
A BEINGs system would
also defer working on the base step, but could -- and would -- place
a note about that deferral into the assertional warning base. Before
thinking it was finished, the system would attend to that note
carefully. This is not a criticism:
FRAMES are meant to
model perception, BEINGS are not.
The contrast with HACKER is similar to that of FRAMES.
The orientation of HACKER is to put together pieces of programs
into something close to the final desired target, then try and run
it. When errors result, they are analyzed and then patched. PUP6
is oriented toward writing bug-free code; what it writes must always
coincide with its internal specifications of that code. The user may
later change his mind, and the system should be (and occasionally
is) able to modify its
own programs, but this process is much better defined than blind
debugging. On the other hand, if PUP6 does have an
unexpected bug in it, it may be fatal. One
way to overcome this would be to include special recover-from-bugs
BEINGs.
The formal manipulation systems which also synthesize code
are so different from BEINGs, that it is difficult to compare them.
These logical systems (e.g., Luckham's) attack an entirely different
problem. Their task is specified rigorously in advance, and they
proceed formally to search for a solution program. PUP6 is
designed to work from a much more
ambiguous specification, heuristically. Each action
is implicitly justified by the fact that the user can later describe
to the system
what is undesirable about the program it's generated. PUP6 should
thus be tolerant of user ambiguity and error.
Like ⊗4any⊗* AP system which writes structured programs, the
external behavior of PUP6 is
reminiscent of macro expansion regardless of ⊗4what⊗* the internal
BEING interactions are. One major distinction between the two
concepts is when and how the macros are expanded. Traditional answers
are: at compile time, by rigid substitutions.
BEINGs systems expand their "macros" at
times they choose, using knowledge gleaned from each other and from
the user. For example, consider a series of macro calls occurring in
the code: (m1)(m2)(m3). A macro expander could expand these in any order,
and the three activities could go on simultaneously, with no interference
or change in the final code. BEINGs would try to find some task common
to all three calls, and work on that first. The order of which to
first expand fully would be carefully weighed,
and during that expansion new
information would be learned which could affect the expansions of the
other two function calls. The final code would not simply be the APPEND
of the expansions of m1, m2, m3, as it would with a macro scheme.
.B
.E
.ONCE CENTER
⊗5↓_9. Questions for AP systems_↓⊗*
The success of an AI effort
can be assessed only where accepted standards of intelligence exist.
The character of the dialogue (described in the next section)
is, of course, one important measure of the intelligence
of an AP (automatic programming)
system. One which asked "What is the first instruction?
What is the second...?" would be universal but worse than useless.
This section proposes some questions one should ask when
confronted by a new AP system
which generates code heuristically from a dialogue.
Capsule answers pertaining to PUP6 are parenthesized after each question;
fuller answers are located elsewhere.
The questions break into two categories. First, one wants to
know what the system does. Most important, does it synthesize a program?
(yes.)
If so, how complex is that task, and what is the program
specification like? (a concept formation program, several pages long,
from one specific protocol dialogue).
How precise must the user be: may he err (no),
forget (no), change his mind? (in limited cases.)
How detailed must his dialogue be? (by construction, just about as
detailed as if he were talking to a CS grad student programmer.) How
closely does the dialogue determine the details of the final code?
(much of the
final code is clearly determined by the precise user responses.)
How does the user know where he is during the dialogue? (simulated cursors
pointing to a graph representing synthesized code.)
Does the user ever get lost? (Often.)
Quite seriously, an important question is whether the system
writes a second program. (yes.)
If so, how does it relate to the first one
synthesized? (both are inductive inference LISP programs.)
How much of the AP system is actually used in writing
⊗4both⊗* programs? (49 BEINGs out of 79.)
One should ask what the full range of programs is
which the system has successfully synthesized. (the concept former, the
grammatical inferrer, and the simple property list manipulation system.)
The dual questions to
these inquire whether a program may be generated by several different
(in the sense of rephrasing and in the sense of reordering the subtasks)
dialogues (theoretically, but not practically),
and what the range of variability is. (theoretically large, but
practically quite limited.) Similarly, what
about the target language: is it a parameter? (no, always the same.)
How does it relate to
the language the system is written in? (both written as BEINGs in
INTERLISP.)
Does the system modify as well as write code? (yes, but not
as its major mechanism.) If so, can
it modify anybody's, or only those programs it has written itself?
(only its own, and only in simple ways.)
What is its "style" of programming? (many supplementary comments,
fairly clean structured programs written as BEINGs.)
One also wishes to know the size
of the tool as well as the size of the job: How does the system size
compare to target program size? (one order of magnitude larger.)
Efficiency considerations are often the crucial
pragmatic ones. Economics and current hardware restrict the amount of
computation which may be done in toto; the expected lifetime of the
universe restricts us from using the most "elegant" algorithms; and
human patience restricts the interaction delay time (to a minute or
two of real time). One must therefore ask things like
how much time and space it requires, how long a delay there is
between user inputs, etc. (one CPU hour, 100K, ≤1 CPU minute
between responses, compiled INTERLISP run on a PDP-10 TENEX system.)
The second class of questions concerns the theoretical side
of the AP system. What new ideas is it meant to experiment with?
(using BEINGs to write large programs).
How
do these ideas fare? (mixed results; this is the subject of
most of the remainder of this paper).
Does it write its code in the right way?
(it asks the right questions of the user at just the proper
times with respect to the original protocol.)
How extendable is it: how difficult is it to add new pieces
of knowledge and to modify old ones? (theoretically easy, practically it
requires familiarity with the system.) Could it write programs in
various fields (possibly), in various styles (unlikely), in various sizes?
(the three target programs differ by less than one order of magnitude.)
How is knowledge represented internally? (BEINGs, assertions,
demons.) Is any of it
duplicated? (not much above the LISP language level; some universal
knowledge is factored out; e.g., how to CHOOSE-FROM a set of BEINGs.)
If so, why? (too primitive to factor it out; e.g.,
how to bind variables.)
.B
.E
.ONCE CENTER
⊗5↓_10. Examples from the dialogue which results in CF_↓⊗*
The dialogue to synthesize the concept formation
program (CF) begins by PUP6 asking the
user for any task. The user replies, ⊗4Write a program which does
concept formation⊗*. There are many decisions that PUP6 knows about in
writing a specialized concept formation program [4],
and it was given just enough concept formation knowledge to
defer each. For example, it must choose between classificatory,
comparative, and metrical concept formation. Since all three involve
partitioning a domain of examples, PUP6 decides it can defer this
choice until after code for the partitioning is synthesized. The
effort of the system is now directed toward this "piece" of the
program. When it is completed, an hour or two later, the system asks
the user to make this decision. When he responds ⊗4CLASSIFICATORY⊗*,
which involves ⊗4only⊗* partitioning, PUP6 prints that it has
finished the entire CF task, and dumps the newly created BEINGs onto
a file.
An example of the PUP6 community of BEINGs interacting will now
be presented.
When a BEING is said to do or recognize or know something, as
in the following paragraphs, what is actually meant is
that one or more of its organs specificly encode that knowledge. All
the organs of the hundred BEINGs in PUP6 were coded by hand, not by
each other. They then encoded other BEINGs, interacting only via
the dialogue.
Let's jump one third of the way into the dialogue which results in
creating CF. The system learns it must compare the input scene against
its internally stored models of concepts, until it finds one which
passes the comparison. It asks the user what it means for
scene S to fail the comparison to model M. The user replies,
whenever M is incompatible with the observed features of S.
The IDEN organ of the
CONTRADICTS BEING recognizes this sentence pattern, and transforms it
to
(FORSOME F IN M-FEATURES (⊗4specialized-contradicts⊗* F S-FEATURES)).
The BEING
which inserts this into the synthesized code requires that the user
pick some proper name for the new BEING ⊗4specialized-contradicts⊗*;
this will be a streamlined contradiction test predicate written by the
CONTRADICTS BEING. Say the user reponds by calling it IMPOSS. This naming
and specializing is central to BEING creation: a BEING recognizes an
instance of itself, and decides either to insert a call to itself or
else to insert a call to a specialized version of itself. If any
nontrivial decisions must be made, and can be settled at synthesis
time, then the latter alternative is chosen. CONTRADICTS is too
general a BEING to be called in an inner loop, so its content will
be hammered out at synthesis time. On the other hand,
FORSOME is so primitive that no specialized
version of it is written normally.
Here is the way this piece of the dialogue actually appears.
The user's reponses are italicized.
.BEGIN NOFILL FLUSH LEFT
PUP: Please type in a logical expression which is true iff we terminate the loop
USER: ⊗4Any feature in model-features is incompatible with scene-features⊗*
PUP: PUP wants USER to type in name for specialized version of CONTRADICTS
USER: ⊗4IMPOSS⊗*
.END
Later in the dialogue, PUP6 decides it must
expand the function IMPOSS. The CONTRADICTS BEING is again called on; this
time it is asked how to write a specialized version of a
contradiction test. Its SPECIALIZATIONS organ
replies that ⊗4some of⊗* the following types of
contradiction may occur: PROBABILITY=0, PROBABILITY=1, and
PROBABILITY⊗4ε⊗*(0,1). This is the germ of the idea for the
MUSTNOT/MUST/MAY categorization of features. The SOME-OF BEING takes
control, and asks if the decision can be deferred.
Eventually
the
DEFERRAL BEING reports failure. SOME-OF sees it has no basis upon
which to form a guess, and wants somebody to ask the user. The
ASK-USER BEING takes
over.
The names of the three choices are so cryptic that
their WHAT and HOW organs are printed out to the user, as well as
their names. The user types back his choices, in our case all three.
SOME-OF regains control and composes
three new BEING calls, separated by a
conditional:
.BEGIN GROUP NOFILL SELECT 3 INDENT 0
(COND
((IS-OF-TYPE F PROBABILITY=0-PART-OF-M-FEATURES) (PROBABILITY=0-CONTRADICTION F S-FEATURES))
((IS-OF-TYPE F PROBABILITY=1-PART-OF-M-FEATURES) (PROBABILITY=1-CONTRADICTION F S-FEATURES))
((IS-OF-TYPE F PROBABILITY⊗4ε⊗*(0,1)-PART-OF-M-FEATURES) (PROBABILITY⊗4ε⊗*(0,1)-CONTRADICTION F S-FEATURES)))
.END
TRICHOTOMY recognizes this as a split on a function's values
being =0, =1, or between zero and one. It asks whether this
particular function can only range over the interval [0,1].
PROBABILITY answers affirmatively, so SOME-OF replaces the final
IS-OF-TYPE test by the constant T.
That is, any feature not passing one of the first two IS-OF-TYPE tests ⊗4must⊗*
pass the final one, so the final one is unnecessary.
The three contradiction-checking predicates
know (their ENCODABLE organs tell them) that
they are immediately encodable. A probability=0 contradiction means
that arg1 must not occur in the world arg2 yet it does. So the
META-CODE of the PROBABILITY=0-CONTRADICTION BEING is defined as
(MEMBER arg1 arg2). This corresponds to a MUSTNOT feature F which is
present in the world (in
the set S-FEATURES, features of the scene). A
probability=1 contradiction occurs when a feature arg1 must occur in
the world arg2, yet it doesn't. So its definition is simply (NOT
(MEMBER arg1 arg2)). It is impossible for a feature with probability
in (0,1) to be in contradiction with any world lacking negated
features, so this third predicate is defined as the constant NIL.
That is, if F is a MAY feature of the model M, then there can be no
contradiction between F and ⊗4any⊗* features of the scene S.
The IS-OF-TYPE BEING recognizes that it must work hard to
fll in organs for each of the two case tests, unless M-FEATUiRES is organized
permanently into three parts (thus permitting the complex calls
"IS-OF-TYPE" to be replaced by the simple ones "MEMBER" in the above
piece of code). The STRUCTURE-INDUCING BEING takes over, probing
the permissability and the difficulty of such a constraint. It finds
nothing about this tripartite structuring which could damage any
earlier synthesized code, and asks the user's blessing. Demons are
created, stating that any reference to
the entire list of M-FEATURES must be replaced by either APPEND of
the three new lists, or else by three separate statements. GET-NAME
is indirectly called, and he has the user name the three new sets of
features; say he responds by calling them MUSTNOT, MUST, and MAY. The ENCODE
BEING says to draw an arrow on its graph of newly
created BEINGs, going
from the BEING call (IMPOSS F S-FEATURES) to the new block of code:
.BEGIN GROUP NOFILL INDENT 0 SELECT 3
(COND ((MEMBER F MUSTNOT-PART-OF-M) (MEMBER F S-FEATURES))
((MEMBER F MUST-PART-OF-M) (NOT (MEMBER F S-FEATURES)))
( T (COMMENT THIS "T" REPLACES "MEMBER F MAY-PART-OF-M") NIL))
.END
This is now the META-CODE organ of the new BEING called IMPOSS.
Most of its other organs are cloned from its generalization, namely
CONTRADICTS. During the course of writing this piece, however, some
of these organs should be slightly changed. For example, IMPOSS's reason
for existing should be strengthened when the simple MEMBER function
calls replaced the slow IS-OF-TYPE BEING calls.
Most of this processing is inter-BEING activity; the user is
not needed for -- or even informed of -- most of it.
Alas, "the user" is not generic; there
was only one successful user.
.B
.E
.ONCE CENTER
⊗5↓_11. Excerpt from the synthesized program itself running_↓⊗*
The PUP6 system interacts with the user, resulting in the creation of
CF, the desired target program. Like PUP6, CF is nothing but a pool of
BEINGs. So, as it runs, the user can interrupt it and ask it questions
about what it's doing. The answers are typically just some organs of the
BEING currently in control.
Any BEING on the control stack will provide fully instantiated
answers to any of its 29 allowable queries; all other BEINGs will provide only
hypothetical answers.
An excerpt of CF running without interruption
is given below, followed by the same piece with interruptions.
The characters of the dialogues with PUP6 to write CF, and with CF to
learn concepts, are similar; an "interrupted" excerpt of the user-PUP6
dialogue will be omitted.
Some liberty has been taken with the
English (e.g., ⊗4WHAT?⊗*, not ⊗4What are you doing?⊗*).
CF types in boldface, the user in italics.
.BEGIN NOFILL FLUSH LEFT SELECT 2
READY TO ACCEPT BRAND NEW SCENE: ⊗4(??? (A B) (Block A) (Green B) (Touches A B))⊗*
NOT AN ARCH. NOT A TOWER. MIGHT BE A ROW. MY GUESS IS ROW.
WHAT IS IT REALLY? ⊗4Nothing⊗*
TRANSFER FROM MAY TO MUST SETS OF ROW: (TOUCH A B)
READY TO ACCEPT BRAND NEW SCENE:
.END
Here is the ⊗4same⊗* excerpt, but with the user interrupting occasionally.
"⊗4U:⊗*"
signifies he has interrupted,
and "⊗4OK.⊗*" signifies he has told CF "⊗4OK, go back to work now.⊗*"
.BEGIN NOFILL FLUSH LEFT SELECT 2
⊗4U: What are you doing?⊗*
CF: Divide a domain into subdomains, using specialized version of PARTITION-DOMAIN
⊗4How?⊗*
CF: By building up the partition gradually, by adding one <scene, name> pair. ⊗4OK.⊗*
READY TO ACCEPT BRAND NEW SCENE: ⊗4(??? (A B) (Block A) (Green B) (Touches A B))⊗*
⊗4U: What are you doing?⊗*
CF: Break NAME piece off (??? (A B) (BLOCK A) (GREEN B) (TOUCHES A B)) to examine.
⊗4What code is the current BEING going to execute?⊗*
CF: The meta-code of SOME-PART-OF↓1↓4 is (CAR arg1). Evals to ???. ⊗4OK.⊗*
NOT AN ARCH.
⊗4U: Why?⊗*
CF: The ARCH MUST-relation (SUPPORTS A B) is contradicted in the SCENE. ⊗4OK.⊗*
NOT A TOWER. MIGHT BE A ROW. MY GUESS IS ROW.
WHAT IS IT REALLY? ⊗4U: Why are you asking?⊗*
CF: CF cannot resolve it, and must at this time, so CF asks USER
⊗4OK. What you wanted is: Nothing⊗*
⊗4U: What are you doing?⊗*
CF: Go through space of concept names, and make user aware of changes.
⊗4What BEINGs might this affect?⊗*
CF: MESSAGE is possibly called; some version of TEST is possibly called. ⊗4OK.⊗*
TRANSFER FROM MAY TO MUST SETS OF ROW: (TOUCH A B)
READY TO ACCEPT BRAND NEW SCENE:
.END
.B
.E
.ONCE CENTER
⊗5↓_12. Other Tasks_↓⊗*
Let us briefly describe GI and PL, the second and
third programs PUP6 synthesized. GI is a simple
grammatical inference program. It builds up a set of plausible rules,
rather than enumerating through the space of all grammars. Yet it
lacks most of the "tricks" of the best g.i. programs. The user
repeatedly specifies a string, labelled LEGAL, ILLEGAL, or ??. In the
latter case, GI must print out its guess and accept the correct label
from the user. In all three cases, it must update its set of
plausible rules to be at least consistent with all positive and
negative instances observed. In some versions of GI which the user coaxes
PUP6 to generate, a parse tree may be maintained and inspected; in
other versions, just a list of the rules used during a parse is kept.
PL is a data-retrieval-on-keys system, which maintains a
property list structure.
The user repeatedly types
in a request to insert, delete, or inspect a record (e.g., INSERT:
PASSENGER Jones FLIGHT 741 FROM SFO TO LAX CARRIER TWA LEAVE 7:15
ARRIVE 8:00). Any unspecified fields are treated as don't cares;
thus the request is matched against entries in the data base.
.BEGIN GROUP
The table below shows how each type of knowledge was used in
writing the three target programs. All numbers refer to BEINGs.
.NOFILL INDENT 6 SELECT 7
.ONCE CENTER
⊗5U S E D I N S Y N T H E S I Z I N G⊗*
TYPE OF CF CF CF CF GI PL not Crea Crea Crea Total Total
KNOWLEDGE GI GI PL only only only used -ted -ted -ted BEINGs BEINGs
PL only only ever in CF in GI in PL in PUP6
Programming 39 7 5 5 3 1 14 52 27 21 174 74
Domain-Specific 0 3 0 9 8 1 5 4 10 3 43 26
Total BEINGs 39 10 5 14 11 2 19 56 37 24 217 100
.END
"Created" means "written by PUP6 during the dialogue which synthesized".
Notice the percentage of programming BEINGs which were used in all
three dialogues (39/74). The three domain-specific BEINGs used
in both CF and GI sessions indicate that they may be Inductive
Inference domain specific. They deal with partitioning a domain,
and aren't used in PL, which is not an
inductive inference task.
Although BEINGs can theoretically handle
user errors, PUP6 was set up expecting that the user would never err
or forget. He could, after the program was finished, try it out and
describe how he wished it changed. Occasionally, PUP6 actually make
the right modifications. For example,
after the synthesis of PL is finished, the user tries
out the program and finds that he is unable to delete more than one
reservation with any single command. He complains about this, and
PUP6 modifies the program so that he may specify a pattern, and all
reservations matching that pattern will be purged.
There is only one sentence, however, which is able to effect this change;
the slightest deviation would cause it to be misunderstood!
.B
.E
.ONCE CENTER
⊗5↓_13. Numerical efficiency data_↓⊗*
Good measures of concentration of intelligence are not yet
available. The only alternative is to present ephemeral numerical
efficiency data, which now follows. PUP6 is 200 pages long when
PRETTY-PRINTED, and has synthesized three programs, 7, 20, and 30
pages long (1, 4, and 6 pages, when coded by hand.)
Two important factors are omitted when simply comparing system
and target lengths. First, one might improve any such measure by simply
decreasing the size of a given system. This is wrong, since, e.g.,
removing all the comments from a program shouldn't increase its
intelligence rating! Secondly, the total volume of distinct target programs
should be considered. Compilers turn out programs small compared to
themselves, but are valuable because of the vast variety of such programs
they can put together.
This factor reflects how much of the "insides" of the system can be used in
writing many programs.
PUP6 takes 60 cpu minutes to produce CF (compiled INTERLISP
code, run on a PDP-10 TENEX system). During this time,
300K characters get typed out to the user, who reponds with about
4K himself. The dialogue lengths are best specified in characters,
since often the user will supply simply a number or a letter or a
YES/NO reply.
Even the character counts are
very rough, because the format of the AP dialogue can easily be
varied by a couple orders of magnitude. The mean wait time (between
the user's input and the next machine output) is several seconds. The
longest delay between successive user inputs is 1 CPU minute.
Unfortunately, PUP6 requires 100K to run in, which (with INTERLISP)
exhausts the virtual memory of the current hardware being used. One
would lose vast amounts of time by using intermediate storage devices
or by running consecutive communicating jobs. Efficient use of multiple
processors and of extended core storage might be made.
PL was one fifth the size of CF, and took about a seventh
as long to generate. The dialogue was shorter by only
a factor of three. The
dialogue for the CF target program was too long and tiresome; the problem
was even more acute for the PL program.
.B
.E
.ONCE CENTER
⊗5↓_14. Conclusions_↓⊗*
The main successes of the experiment were that the desired
reasoning steps in the original protocol
were actually simulated, most of the BEINGs were used
in writing more than one of the programs, and the synthesized code
could be interrupted and asked a few kinds of questions. The main
defects were the inflexibility of the system to new dialogues, the
inability for the user to add new, high-level, domain-specific
knowledge, the verbosity of the system, and its ill-founded dependence on
user reliability. These problems seem to
be inherent in the task of nonformal automatic synthesis of large programs.
The small fraction of organs which were actually relevant to each BEING (a
third) indicates that this scheme should be used as one mechanism in
some AI systems, but not as the sole curator of information and control.
The fact that target code is in the format of BEINGs limits
the size of the target programs (⊗7≥⊗* one page) and their efficiency (a
BEING-call is a very slow and intricate process) and of course their
style. The most startling result -- which should have been forseen --
was that the synthesized code has all the properties of BEINGs.
This was mentioned
casually earlier,
but is worth restating: the generated code
is self-commenting in the strong sense that it
can answer any
(of our set of 29) questions about itself. Those which make sense at
compile-time can be asked then; those which make sense during
execution may be asked then.
The set of questions the user was expected to want to ask the
system is similar to the questions one BEING wants to ask another.
So when the "nice" user interrupts, his questions are almost always a
simple retrieval from a property list (a GETP or a composition like
EVAL o GETP). When the average user interrupts, his questions are
often unintelligible to PUP6.
Meaningful use of comments proved helpful. As an example, the
comment at the end of the main outer loop was "COMMENT, INFINITE LOOP
IN THIS PROG" for most of the CF-producing session. Near the end
of the dialogue, the CLARIFY-IMPROBABLE-SITUATION BEING recognizes
this as a warning, works on introducing a breakaway test, and then
replaces the comment by one indicating that no infinite loop exists
there anymore. The advantage of embedding our
insertions in the code in this way is that, at any stage, the user can
inspect the code and get something meaningful out of the comments
which are present.
The careful bookkeeping actually did eliminate some
carelessness errors, though it had no effect on user errors or later
program maintenance directives. This last process is less
ill-defined than blind debugging, and in fact is similar to
programming itself. The deferral of decisions has the
corollary of reducing (though not minimizing) communication with
the slow user channel.
Synthesizing a new,
clean target program probably would require
adding a few domain-independent BEINGs and several domain-specific
BEINGs, and generalizing a few organs of some existing BEINGs.
Hopefully, no new organ would need be added to BEING anatomy.
Since the dialogues for GI and PL were not studied before-hand,
their acceptability would have demonstrated the success of the
system. Most of the existing BEINGs were used in generating these
new programs, but a few new BEINGs had to be added. This addition
process required some detailed knowledge of the PUP6 system, not just of
the domain. Equally
discouraging, the dialogue to write PL is too long and detailed
for the task at hand. It also seems that the front end is too
brittle to allow anyone unfamiliar with the system to easily work
with it.
The need for flexible communication stems only
partially from inter-user differences. A serious and unexpected
source of conflict here is the amount of inference the user wants
done. This level is relatively subjective, and may fluctuate rapidly
even with one user writing one program. Perhaps there should be a
dial he can set. At one extreme, the system would ask the user to
verify even the most trivial inductions. At the other extreme
setting, the system would probably write the target program after
just a few brief user inputs. The user would then try out one program
after another, interjecting just one or two comments each time, after
which the system would come up with the next version. This program
modification mode seems appropriate for someone ignorant of LISP, who
nevertheless has the I/O behavior of his target clearly in mind.
Some of the BEING organs proved embarrassingly unnecessary.
The CO-REQUISITES organ was never used. The only activity which had
to be done concurrently was demon activation. The AFFECTS organ was of
interest to the user occasionally, but never to any BEING. The
EFFECTS organ originally had a twin, describing what would happen if
each BEING were ⊗4not⊗* called right now. In each case, this was
merely a trivial restatement of the frame problem. So, like STRIPS,
PUP6 ignores the frame problem: all changes must be
explicit.
Much of PUP6's comments are boring and unnecessary; this was
felt to be an engineering problem which would be ignored in all but a
"marketable" AP system. This view was probably incorrect. One of the
most severe AP problems is having the system inform the user of
precisely what he should know -- no more nor less. This requires a
sophisticated user model cleverly interfaced to the current
programming task.
Another problem with large, long dialogues is user error. A
human has great difficulty keeping "on top" of everything. He may
get lost, forget, change his mind, or misunderstand. Again, this
problem was originally considered ignorable, but has shown itself to
be a limiting practical factor in wide accessability. It was
necessary to create a forgetful-user demon to prevent anaphoric
reference to something which, though uniquely determined, hadn't been
mentioned within the past several seconds of real time.
Related to this is the problem of keeping
the user informed of where he is. PUP6 simulated a continuous display
of the graph of the current partial program, augmented by static and
dynamic cursors. This simple
use of dynamic and textual indices seems
insufficient. The user was still often confused, wishing a section
referred to not by pointing, not by name,
but rather by a brief, meaningful (to him
only!) phrase. This may also be one of the major AP problems which
must be studied further.
Will we ever need a
system whose target programs will be longer and more complex than the
system itself?
Why couldn't we settle for a truly intelligent
AP system several times as large as anything it
could produce?
Worries about large dialogues arise because future AP
systems are viewed as tools for writing programs which humans simply
cannot manage currently: viable natural language systems, huge
operating systems, better AP systems.
Is there any evidence that such hyper-productive systems could ever
be written? There seems
to be a manageable body of information which one needs master
to do programming. One can
write programs as complex as he wishes if the program is designed in a
clean way. Thus the size of AP systems will probably
level off (albeit huge
compared to existing programs) even as the size and complexity of
their generated code continues to increase and, eventually, surpass
them.
.B
.E
.ONCE CENTER
⊗5↓_References_↓⊗*
.SPACING 35 MILLS
.PREFACE 110 MILLS
[1] Balzer, Robert, ⊗4Automatic Programming⊗*, Institute Technical
Memorandum, University of Southern California
Information Sciences Institute, September, 1972.
[2] Gadwa, Peter, ⊗4SPOT, a mini concept formation program⊗*,
Master's Thesis, Artificial Intelligence Laboratory,
Stanford University, Stanford, California, August, 1973.
[3] Green, Waldinger, Barstow, Elschlager, Lenat, McCune, Shaw, and
Steinberg,
⊗4Progress Report on Program-Understanding Systems⊗*, Memo AIM-240,
CS Report STAN-CS-74-444,Artificial Intelligence Laboratory,
Stanford University, August, 1974.
[4] Hempel, Carl G., ⊗4Fundamentals of Concept Formation in
Empirical Science⊗*, University of Chicago, Chicago, Illinois,
1952.
[5] Hewitt, Carl, ⊗4A Universal Modular ACTOR Formalism for
Artificial Intelligence⊗*, 3rd International Joint Conference on
Artificial Intelligence,
1973, pp. 235-245.
[6] Minsky, Marvin, ⊗4Frames⊗*, in ⊗4Psychology of Computer
Vision⊗*, 1974.
[7] Minsky, Marvin, and Papert, Seymour, ⊗4Artificial
Intelligence Progress Report⊗*, MIT Project MAC, AI Memo
252, 1972.
[8] Newell, Allen, and Simon, Herbert,
⊗4Human Problem Solving⊗*, 1973.
[9] Teitelman, Warren, ⊗4INTERLISP Reference
Manual⊗*, XEROX PARC, 1974.
[10] Winston, Patrick, ⊗4Learning Structural Descriptions
from Examples⊗*, Ph.D. thesis, Dept. of Electrical Engineering,
TR-76, Project MAC, TR-231,
MIT AI Lab,
September, 1970.
The ideas and the system described use concepts from ACTORS,
heterarchy, structured programming, assertional data bases,
flexible data types, pattern-directed invocation of procedural
knowledge, the paradigm of dialogue, studies on program specification,
QLISP, INTERLISP, LISP, English,... In particular, the author
drew heavily from creative discussions with C. Green, R. Waldinger,
D. Barstow, B. McCune, D. Shaw, E. Sacerdoti, and L. Steinberg.
Computer time for the research was generously provided by the
Artificial Intelligence Center of SRI.
.SKIP 2
.ONCE CENTER
⊗5↓_Appendix: the BEINGs_↓⊗*
.PREFACE 300 MILLS
Space does not permit detailed discussion of each BEING, but the
names are sufficiently suggestive to give the reader a rough
characterization of the scope of the knowledge in the PUP6 system.
.SELECT 3
ADAPT:PRECONCEIVED:FUNCTION
ADD:DEFINITION
ADJECTIVE:HANDLER
ANALYZE:IMPLICATIONS
APPLYRULE
ASK:USER:ABOUT
BETTER
CHOOSE:FROM
CLARIFY:IMPROBABLE:SITUATION
CLASSIFICATORY:CONCEPT:FORMATION
COMPARE
COMPARITIVE:CONCEPT:FORMATION
COMPLEX:ALTERATION
COMPLEX:COMPARE:FN
COMPLEX:MODIFY:STRUCTURE
CONCEPT:FORMATION
CONDITIONAL:DELETION
CONDITIONAL:INSERTION
CONSTRAIN
CONTRADICTS
DATA:STRUCTURE:DELETIONS
DATA:STRUCTURE:INSERTIONS
DEFER:DECISION
ELEMENT
ENCODE
EXAMINE:STRUCTURE
EXTRACT:RELEVANT:SUBSET
FAST:GET:NAME
FAST:SATISFY
FILL:IN:UNDEFINED:SECTION
FIX:INCORRECT:PIECE
FOREACH
GET:DATA:STRUCTURE
GET:HOLD:OF
GET:NAME
GET:NEW:INFORMATION
GRAMMATICAL:INFERENCE
INFER:CONTEXTFREE:GRAMMARS
INFER:CONTEXTSENSITIVE:GRAMMARS
INFER:FIXEDCLASS:GRAMMARS
INFER:MULTICLASS:GRAMMARS
INFER:PHRASESTRUCTURE:GRAMMARS
INFER:REGULAR:GRAMMARS
IS:OF:TYPE
JOINING:FUNCTION
LIST:STRUCTURE
MAJOR:MODIFY:STRUCTURE
MAKE:A:GUESS
MAKE:ENCODABLE
MAKE:NEW:BEING
MESSAGE
METRICAL:CONCEPT:FORMATION
MODIFY:SOME
MODIFY:STRUCTURE
MODIFY:UNTIL
OBTAIN:USABLE:INFORMATION
OPTIMIZE
PARSE
PARSE:BACKWARD
PARSE:FORWARD
PARTITION:A:DOMAIN
PARTITION:BY:TAKE:CLASS:GET:ELE
PARTITION:BY:TAKE:ELE:AND:CLASS
PARTITION:BY:TAKE:ELE:GET:CLASS
PATTERN:MATCH
PROBABILITY=0:#
PROBABILITY=1:#
PROBABILITY>0&<1:#
PROPOSE:PLAUSIBLE:NAMES
RECOGNIZE:ARGS
RECOGNIZE:CAR
RECOGNIZE:CONDITIONAL
RECOGNIZE:CONJUNCTION
RECOGNIZE:EQUALITY
RECOGNIZE:FUNCTION:RETURNS
RECOGNIZE:INCLUSION
RECOGNIZE:LITERALS
RECOGNIZE:NUMBER
RECOGNIZE:SET:RELATIONS
RECOGNIZE:SOME:MEMBER
RECOGNIZE:TAIL
REINVESTIGATE:DECISION
REPEATEDLY
RESOLVE:DECISION
SATISFY
SCENE
SEARCH
SERVE
SIMPLE:COMPARE:FN
SOME:PART:OF
STRING
STRUCTURE-INDUCE
STUDY:TYPE
SUPPORT&DUMP
TAKE:HOLD:OF
TEST
TRANSLATE
TRICHOTOMY
USE:INFORMATION
UTILIZE
WHEN:NEXT
WRITE:PROGRAM
.B
.E
.ONCE CENTER